[codegen] Add max(half, half) support when enable fp16 #3811

ZQPei · 2019-08-21T05:23:28Z

Fix the following error when compiled with float16 model in cuda.

/tmp/tmpz_0pydlm/my_kernel.cu(9890): error: more than one instance of overloaded function "max" matches the argument list:
            function "max(int, int)"
            function "max(unsigned int, unsigned int)"
            function "max(int, unsigned int)"
            function "max(unsigned int, int)"
            function "max(long, long)"
            function "max(unsigned long, unsigned long)"
            function "max(long, unsigned long)"
            function "max(unsigned long, long)"
            function "max(long long, long long)"
            function "max(unsigned long long, unsigned long long)"
            function "max(long long, unsigned long long)"
            function "max(unsigned long long, long long)"
            function "max(float, float)"
            argument types are: (half, __half)

Please check!

cchung100m · 2019-08-21T07:47:03Z

Hi @ZQPei

Please check the CI error

docker/bash.sh tvmai/ci-lint:v0.51 ./tests/scripts/task_lint.sh

Makefile:70: recipe for target 'cpplint' failed

make: *** [cpplint] Error 1

script returned exit code 2

ZQPei · 2019-08-21T10:09:04Z

Hi @cchung100m
Now I have passed all 5 checks
please check
Thanks

cchung100m · 2019-08-21T12:33:20Z

Hi @ZQPei

For your PR, besides codegen_cuda.cc, we should also need to add the unittest in tests/python/unittest/test_codegen_cuda.py.

ZQPei · 2019-08-21T12:56:01Z

How to add a unittest？
Do I need to do that？

cchung100m · 2019-08-21T13:18:02Z

Hi @ZQPei

In my opinion, please refer to the post at the tvm forum to develop your testing script.

hope helps,

ZQPei · 2019-08-22T02:55:53Z

Hi, @cchung100m
I have wrote a unit test for this PR.
Test was passed on my machine.
Should I add this code to TVM?

Here is the unit test.

def test_cuda_vector_max():
    num_thread = 8
    target = 'cuda'
    def check_vector_max(ctx, n, dtype):
        if not tvm.gpu(0).exist or not tvm.module.enabled("cuda"):
            print("skip because cuda is not enabled..")
            return
        if dtype == "float16" and not have_fp16(tvm.gpu(0).compute_version):
            print("skip because gpu does not support fp16")
            return
        A = tvm.placeholder((n,), name='A', dtype=dtype)
        B = tvm.placeholder((n,), name='B', dtype=dtype)
        C = tvm.compute((n,), lambda i: tvm.max(A[i], B[i]), name='C')
        s = tvm.create_schedule(C.op)
        bx, tx = s[C].split(C.op.axis[0], factor=num_thread)
        s[C].bind(bx, tvm.thread_axis("blockIdx.x"))
        s[C].bind(tx, tvm.thread_axis("threadIdx.x"))
        fun = tvm.build(s, [A,B,C], "cuda", name="vector_max")

        np_a = np.random.uniform(size=n).astype(dtype)
        np_b = np.random.uniform(size=n).astype(dtype)
        np_c = np.maximum(np_a, np_b)
        a = tvm.nd.empty((n,), A.dtype, ctx).copyfrom(np_a)
        b = tvm.nd.empty((n,), B.dtype, ctx).copyfrom(np_b)
        c = tvm.nd.empty((n,), C.dtype, ctx)
        fun(a, b, c)
        np.testing.assert_equal(c.asnumpy(), np_c)

    ctx = tvm.context(target, 0)
    check_vector_max(ctx, 10, "float32")
    check_vector_max(ctx, 10, "float16")

anijain2305

LGTM

anijain2305 · 2019-08-22T04:51:43Z

src/codegen/codegen_cuda.cc

@@ -50,6 +50,8 @@ void CodeGenCUDA::AddFunction(LoweredFunc f) {
 std::string CodeGenCUDA::Finish() {
  if (enable_fp16_) {
    decl_stream << "#include <cuda_fp16.h>\n";
+    decl_stream << "__device__ half max(const half a, const half b)\n"


Do we know which operators we have to overload as such? "max" is one of them. Do we need others?

For now, I only find max that need to be overloaded.

BTW, I have a question about the checks.
Why this commit cannot be built today? It was successful yesterday.

Hi, we saw more failures while trying to run full resnet

#3816 (comment)

I think, we are missing all reduce ops. Will it be possible for you to help with this? (In a separate PR, this one is good to go)

ZQPei · 2019-08-22T05:38:55Z

Hi @cchung100m @anijain2305
The commit to codegen_cuda.cc is passed yesterday.
However, the commit of adding unittest to test_codegen_cuda.py was failed today.
Can you help me to find out what caused the check failed?

Fix the follow error when compiled with float16 model. ``` /tmp/tmpz_0pydlm/my_kernel.cu(9890): error: more than one instance of overloaded function "max" matches the argument list: function "max(int, int)" function "max(unsigned int, unsigned int)" function "max(int, unsigned int)" function "max(unsigned int, int)" function "max(long, long)" function "max(unsigned long, unsigned long)" function "max(long, unsigned long)" function "max(unsigned long, long)" function "max(long long, long long)" function "max(unsigned long long, unsigned long long)" function "max(long long, unsigned long long)" function "max(unsigned long long, long long)" function "max(float, float)" argument types are: (half, __half) ``` add max(half, half) support when enable fp16 fix cpplint error. add max(half, half) support when enable fp16 fix cpplint error, replace tab with whitespace add unittest for vector_max add unittest for vector_max add max(half, half) support when enable fp16 add max(half, half) support when enable fp16

vinx13 · 2019-08-22T15:32:13Z

@ZQPei please also add the test case to this PR

cchung100m · 2019-08-23T14:52:58Z

Hi @ZQPei

Please first post in discuss.tvm.ai and provide more details of what you're doing. Currently, it is not clear where the problem is.

anijain2305 · 2019-09-09T19:56:21Z

HI, just a ping to get this in :)

tqchen · 2019-10-24T16:50:40Z

superceded by #4056. Thanks @ZQPei for your contribution

ZQPei mentioned this pull request Aug 21, 2019

[codegen_cuda.cc] pull request check #3812

Closed

ZQPei force-pushed the master branch from 2f8a10a to 19cc770 Compare August 21, 2019 08:50

cchung100m mentioned this pull request Aug 22, 2019

CUDA compilation failed for float16 maxpool #3816

Closed

ZQPei force-pushed the master branch from e55cc8a to 6fe7f1b Compare August 22, 2019 03:22

anijain2305 approved these changes Aug 22, 2019

View reviewed changes

ZQPei force-pushed the master branch 2 times, most recently from 1dc314e to 09ea9ab Compare August 22, 2019 09:17

ZQPei force-pushed the master branch from 09ea9ab to bef5c43 Compare August 22, 2019 09:20

add unittest of vector max on fp32 fp16

4635280

vinx13 approved these changes Aug 23, 2019

View reviewed changes

tqchen closed this Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codegen] Add max(half, half) support when enable fp16 #3811

[codegen] Add max(half, half) support when enable fp16 #3811

ZQPei commented Aug 21, 2019 •

edited

Loading

cchung100m commented Aug 21, 2019

ZQPei commented Aug 21, 2019

cchung100m commented Aug 21, 2019

ZQPei commented Aug 21, 2019

cchung100m commented Aug 21, 2019

ZQPei commented Aug 22, 2019 •

edited

Loading

anijain2305 left a comment

anijain2305 Aug 22, 2019

ZQPei Aug 22, 2019

anijain2305 Aug 23, 2019

ZQPei commented Aug 22, 2019

vinx13 commented Aug 22, 2019

cchung100m commented Aug 23, 2019

anijain2305 commented Sep 9, 2019

tqchen commented Oct 24, 2019

[codegen] Add max(half, half) support when enable fp16 #3811

[codegen] Add max(half, half) support when enable fp16 #3811

Conversation

ZQPei commented Aug 21, 2019 • edited Loading

cchung100m commented Aug 21, 2019

ZQPei commented Aug 21, 2019

cchung100m commented Aug 21, 2019

ZQPei commented Aug 21, 2019

cchung100m commented Aug 21, 2019

ZQPei commented Aug 22, 2019 • edited Loading

anijain2305 left a comment

Choose a reason for hiding this comment

anijain2305 Aug 22, 2019

Choose a reason for hiding this comment

ZQPei Aug 22, 2019

Choose a reason for hiding this comment

anijain2305 Aug 23, 2019

Choose a reason for hiding this comment

ZQPei commented Aug 22, 2019

vinx13 commented Aug 22, 2019

cchung100m commented Aug 23, 2019

anijain2305 commented Sep 9, 2019

tqchen commented Oct 24, 2019

ZQPei commented Aug 21, 2019 •

edited

Loading

ZQPei commented Aug 22, 2019 •

edited

Loading